transformer supplemental material
Transformer in Transformer Supplemental Material
We can see that for both DeiT -S and TNT -S, more patches are related as layer goes deeper. MLP to calculate the attention values. The attention is multiplied to all the embeddings. We extract the features from different layers of TNT to construct multi-scale features. The COCO2017 val results are shown in Table 2. TNT achieves much better Table 2: Results of Faster RCNN object detection on COCO minival set with ImageNet pre-training.
Technology: